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Abstract. Szemeredi's regularity lemma is a basic tool in graph theory, 
and also plays an important role in additive combinatorics, most notably 
in proving Szemeredi's theorem on arithmetic progressions I19| . |18| . In 
this note we revisit this lemma from the perspective of probability theory 
and information theory instead of graph theory, and observe a slightly 
stronger variant of this lemma, related to similar strengthenings of that 
lemma in 1^. This stronger version of the regularity lemma was extended 
in |21) to reprove the analogous regularity lemma for hypergraphs. 



1. Introduction 

Szemeredi's regularity lemma, introduced by Szemeredi in jJU], is a funda- 
mental tool in graph theory, and more precisely in the theory of very large, 
dense graphs. Roughly speaking, it asserts that given any such large dense 
graph G, and given an error tolerance < e <C 1, one can approximate G 
by a much simpler object, namely a partition of the vertex set into Oe(l) 
classes, together with some edge densities between atoms of this partition, 
such that the approximation is "e-regular" on most pairs of this partition; we 
will formalize these notations shortly. This lemma can thus be viewed as a 
structure theorem for large dense graphs, approximating such graphs to any 
specified accuracy by objects whose complexity is bounded independently 
of the number of vertices in the original graph. 

The regularity lemma has had many applications in graph theory, com- 
puter science, discrete geometry and in additive combinatorics, see [TOl for a 
survey. In particular, this lemma and its variants play an important role in 
Szemeredi's celebrated theorem ^U] that any subset of the integers of pos- 
itive density contain arbitrarily long arithmetic progressions. A variant of 
this structure theorem (also borrowing heavily from ideas in ergodic theory) 
was also crucial in showing in ;rT' that the primes contained arbitrarily long 
arithmetic progressions. The lemma has also had a number of generaliza- 
tions to hypergraphs of varying degrees of strength, see 0, [1], [3], |^, [Tl| . 
jl5j , [5] , [U . The more recent formulations of the hypergraph lemma are in 
fact strong enough to rather easily imply Szemeredi's theorem on arithmetic 
progressions, as well as a multidimensional version due to Furstenberg and 



The author thanks Fan Chung Graham for helpful comments, and Jozsef Solymosi for 
encouraging the creation of this manuscript. The author is also indebted to the anonymous 
referees for many useful suggestions and corrections. The author is supported by a grant 
from the Packard Foundation. 

1 



2 



TERENCE TAG 



Katznelson [Jj. They were also used in the recent paper P^l estabUshing 
infinitely many constellations of any given shape in the Gaussian primes. 

The proof of Szemeredi's lemma is now standard in the literature. How- 
ever, this standard proof is difficult to extend to the hypergraph case; a 
direct application of the argument does give fairly easily a regularity lemma 
for hypergraphs (see [H], but that lemma does not seem to be strong 
enough for applications such as Szemeredi's theorem or the Furstenberg- 
Katznelson theorem^, except when concerning progressions or constellations 
consisting of at most three points (see [T7]). 

In this paper we shall present a slightly different way of looking at Sze- 
meredi's regularity lemma, which we used in |^ to obtain a hypergraph 
regularity lemma with sufficient strength for applications to Szemeredi-type 
theorems. In this new perspective, one views the regularity lemma not as a 
structure theorem for large dense graphs, but rather as a structure theorem 
for events or random variables in a product probability space. This change 
of perspective is analogous to Furstenberg's highly successful approach to 
Szemeredi's theorem in [Hj, in which the purely combinatorial result of Sze- 
meredi was recast as a statement about recurrence for arbitrary events or 
random variables in a probability-preserving system. Just as Furstenberg's 
change of perspective allowed the powerful techniques of ergodic theory to 
be brought to bear on the problem, the change of perspective here allows one 
to employ tools from probability theory and information theory to clarify 
the regularity lemma. In particular we will use three very useful concepts 
from those theories, namely a-algebras (partitions), conditional expectation 
(relative density), and entropy (complexity). As the parenthetical comments 
suggest, each of these concepts has a combinatorial analogue, however the 
author believes that there is some conceptual advantage to be gained by 
using a probabilistic and information-theoretic perspective rather than a 
graph-theoretic one^. One byproduct of this new perspective is that one 
discovers a stronger and more flexible version of the regularity lemma hid- 
ing underneath the standard one. This stronger version is difficult to state 
here without the requisite notational setup, but let us just say for now that 



The difficulty is that in the hypergraph situation, there are several levels of regularity 
or discrepancy that need to be controlled in order to yield a useful bound for arithmetic 
progressions or similar structures, and the lemma in |2] or |3| controls only one of these 
discrepancies. Later regularity lemmas control all of the relevant discrepancies, but there 
are some non-trivial technical issues concerning the relative sizes of the error estimates, as 
certain losses coming from one level of approximation must be compensated for by gains 
from the discrepancy bounds in other levels of approximation. 

^The situation is somewhat analogous to that of the probabilistic method in combina- 
torics. While every probabilistic argument could, in principle, be written in a deterministic 
way (replacing expectations by averages, etc.), it is undeniable that there are significant 
conceptual benefits in using a "probabilistic way of thinking" to approach combinatorial 
problems. 
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it is closely related to a similar improvement of the regularity lemma dis- 
covered recently^ in ^1], in which it was observed that the regularity of 
the large dense graph G relative to the partition given by that lemma can 
be vastly improved after adding or removing a small number of edges from 
G. This strengthened version of the regularity lemma turns out to be quite 
amenable for iterating, and thus gives a relatively painless proof of the hy- 
pergraph regularity lemma; see (21j . 

We will turn to the details in later sections, but for now let us just give an 
informal discussion which already shows that the regularity lemma can be 
viewed in information theoretic terms rather than graph theoretic terms. It 
will be convenient to work with bipartite graphs. Let G = (yi,V2,E) be a 
large dense bipartite graph. Let xi and X2 be two vertices selected indepen- 
dently and uniformly at random from Vi and V2 respectively; thus xi and X2 
are independent random variables, taking values in Vi and V2 respectively. 
The edge set E can now be re-interpreted as a probabilistic event, namely 
the event that the pair (xi,X2) lies in E. We shall abuse notation and refer 
to this event also as E, thus E is now some event determined by the random 
variables xi,X2 (or more precisely, it lies in the <T-algebra generated by the 
random variables xi and X2)- Many of the important statistics about the 
edge set E can now be recast in terms of the event E; for instance, the 
edge density of the edge set E is equal to the probability of the event E, or 
equivalently the expectation of the indicator random variable 1e- Similarly 
one can view relative edge densities of E as conditional expectations of 1^;. 

We have already observed that E is, in principle, determined by xi and 
X2- However, from an information-theoretic perspective this determinism 
relationship can be very "high-complexity" or "fine-scaled", in a sense we 
shall describe shortly. If the vertex sets Vi,V2 have N elements, then the 
random variables xi and X2 have a Shannon entropy of log2 N (they can be 
described by roughly log2 N bits each). On the other hand, the event E (or 
the Boolean function 1^;) has a Shannon entropy of at most log2 2 = 1 (it 
can be described by one bit). If N is very large, we thus see that there is 
much more information contained in the random variables xi and X2 than 
is contained in the event E. To put it another way, knowing that the event 
E is true or false (i.e., that the pair (xi, X2) is an edge in G or not) does not 
even begin to let one determine the exact values of xi and X2- Indeed, in the 
extreme case when the graph G is a random (or pseudorandom) graph, the 
event E behaves almost as if it were independent of the random variables xi 
and X2, despite being actually determined by these variables. More precisely, 
if Ai is any event determined by xi (thus Ai can be thought of as the event 
that xi lies in a fixed subset of Vi, which by abuse of notation we shall 
also call ^1), and A2 is any event determined by X2, then in the random 
or pseudorandom case the event E will be almost completely uncorrelated 

■^Note added in proof: a closely related version of this lemma was recently introduced 
in PP, [21 . See also for yet another perspective on the regularity lemma, this time 
from functional analysis. 
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with the events Ai,A2. This corresponds to the weU-known fact that when 
G is a random or pseudorandom graphs, the relative edge density between 
two large sets Ai,A2 in Vi, V2 will, with high probability, be very close to 
the global edge density of G. (Note that if Ai and A2 were small sets, i.e. 
events of very low probability, then the correlation, or more precisely the 
mutual information, with E would automatically be small.) 

Let us summarize the above discussion in information-theoretic terms. If 
one is given all log2 N bits of xi, and all log2 A'' bits of X2, then the single-bit 
event E is completely determined. But if G is random or pseudorandom, 
and one is only given one bit of xi (specifically, whether xi lies in a fixed set 
Ai) and one bit of X2, one learns almost no information about the bit E. Let 
us informally describe this by saying that E is approximately independent of 
Xi and X2 at "coarse scales" - when only a few bits of xi and X2 are known, 
even though E is determined by xi and X2 at "fine scales" - when most or 
all of the bits of xi and X2 are known. 

Of course, if G is not pseudorandom, then E can be highly correlated 
with a few special bits of xi and X2- To take an extreme opposite case 
to the pseudorandom case, suppose that G is a complete bipartite graph 
connecting all the vertices of a set Ai C Vi to that of a set A2 C V2, and 
not connecting any other pairs of vertices. Then the event E is completely 
determined by one bit of xi (namely, whether it lies in ^1) and one bit of 
X2 (namely, whether it lies in A2). 

Furthermore, it is possible for G to be a hybrid between these two ex- 
tremes. Suppose now that G is a pseudorandom subgraph of the complete 
bipartite graph connecting Ai to ^2- Then E is no longer determined by 
the one special bit of xi associated to Ai, and the one special bit of X2 
associated to A2. However, it is now approximately independent at coarse 
scales of xi and X2, conditioning on Ai and A2- In other words, once the 
events Ai and A2 are known to be true or false, the event E is then ap- 
proximately independent to any further bits of information arising from xi 
and X2- In graph theory terms, this means that when restricting Vi to Ai 
or its complement, and restricting V2 to A2 or its complement, the induced 
subgraph of G behaves pseudorandomly (with some edge density depending 
on which sets were being restricted to). 

The information-theoretic version of the Szemeredi regularity lemma is 
an assertion, roughly speaking, that every event is a hybrid of the two 
extremes in the sense given above. Very informally, given any two high- 
entropy random variables xi and X2, and given any event E, it is possible 
to find some low-entropy random variable Zi determined by xi, and a low- 
entropy random variable Z2 determined by X2, such that E is approximately 
independent of xi and X2 conditioning on Zi and Z2. Again being very 
informal, this means that there exist a small number of bits from xi and 
X2 which correlate with E, and such that no further bits from xi and X2 
have much of a correlation with E. Interestingly, this formulation of the 
regularity lemma requires no independence properties of xi and X2, and 
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also does not require E to be determined by xi and X2; but we do not know 
any applications of this more general version. 

One can view the low-entropy random variables Zi, Z2 discussed above 
as "approximations" to the event E, where the approximation is in some 
coarse information-theoretic sense. It turns out that the proof of the regu- 
larity lemma (see Lemma 14.31 below) in fact yields two such approximations, 
a "coarse approximation" Zi , Z2 and a "fine approximation" Z[, Z2. The 
coarse approximation has low entropy. The fine approximation has signif- 
icantly higher entropy, but it is an exceedingly accurate approximation to 
E; in particular, any error arising from this approximation can exceed any 
losses coming from the entropy of the coarse approximation, in a way which 
can be made precise using a "growth function" F : — > R"*". Finally, the 
coarse and fine approximations will be close to each other, both in an 
sense, and also in an information theoretic sense. We will make these state- 
ments more precise later, however we remark for now that the presence of the 
new parameter F, used to compare the accuracy of the fine approximation 
against the entropy of the coarse approximation, is very suitable for iteration 
purposes, and allows one to extend the regularity lemma to the hypergraph 
setting, in which one has multiple random variables xi,... instead of 
just two, and furthermore one is interested in low-entropy approximations 
to an event which arise not only from individual random variables Xj, but 
also from joint random variables such as {xi,Xj) (and the approximations 
coming from the joint random variables should themselves be approximated 
by other, lower-order random variables). See |21 . A closely related regu- 
larity lemma, which also involves an arbitrary growth function F, has also 
recently appeared in in applications to property testing. 

2. A PROBABILISTIC FORMULATION 

Before we give the rigourous information-theoretic version of the Sze- 
meredi regularity lemma, let us first give a standard formulation of the 
lemma, and also a probabilistic formulation which can be viewed as an inter- 
mediate formulation bridging the graph-theoretic version and the information- 
theoretic^ version of the lemma. We begin with the graph-theory version; 
again, it is convenient to restrict ones attention to bipartite graphs. 

We use 0{X) to denote any quantity bounded in magnitude hy CX for 
some absolute constant C > 0, and more generally we use Oai,...,afc (-^) 
to denote any quantity bounded in magnitude by C{ai, . . . ,ak)X, where 
C(ai, . . . , Ofc) > depends on the parameters ai, . . . , ak- We also use |^| to 
denote the cardinality of a finite set A. 

^We say a formulation is "probabilistic" if it involves such concepts as probability 
spaces, (T-algebras, random variables, (conditional) expectation, and correlation. We say 
a formulation is "information-theoretic" if it involves such concepts as probability spaces, 
(T-algebras, random variables, (conditional) entropy, and mutual information. Clearly 
these two perspectives share much in common, for instance the concept of independence 
is important in both. 
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Definition 2.1. A bipartite graph is a triplet {Vi,V2,E) where Vi,V2 are 
two finite non-empty sets, and E C Vi xV2. If e > 0, we say that a bipartite 
graph (Vi, V2) is e-regular if we have 

(1) \E n (^1 X ^2)1 = ^^1^1 + 0{e\V, X V2\) 

for all Ai C Vi and C 1/2- 

Remark 2.2. While we assert that holds for all subsets Ai, A2 of ^i, V2, 
this condition is only non-trivial for large subsets; it holds trivially when 
X A2\ = 0(e|Vi X V2I). Thus this definition of e-regularity is essentially 
equivalent to other formulations of regularity in the literature in which a 
lower bound is imposed on the size of Ai and A2- 

Theorem 2.3 (Szemeredi regularity lemma, graph-theoretic version). Let 

{yi,V2,E) he a bipartite graph, and let < e < 1. Assume that Vi and V2 
are large depending on e, thus \Vi\, IV2I > Oe(l)- Then there exists a positive 
integer J = Oe(l) and decompositions 

Vi = Vi^Q U Fi,! U . . . U Vi^j 

for i = 1,2 with the following properties: 

• (Exceptional set) For all i = 1,2, we have \Vi^o\ = 0{e\Vi\). 

• (Uniform partition) For all i = 1,2 and 1 < j < J we have \Vi j\ = 

\y^A 

• (Regularity) The induced bipartite graph (Vijj , V2J2) ^ (^iji ^ 
V2J2)) ^•s £-regular for all but 0{eJ'^) of the pairs 1 < j'l < M, 

1 < J2 < J. 

Remark 2.4. The bound J = 0^(1) is a little deceptive, as it conceals the 
fact that J can in fact be extremely large depending on 1/e, indeed there 
are examples where J grows like an exponential tower of height equal to 
some power of 1/e (see [S]). However, the key point is that the bound on J 
does not depend on the cardinality of Vi or V2. Indeed we shall shortly give 
a probabilistic formulation in which Vi and V2 could be infinite (cf. \12\). 

We now give a probabilistic generalization of the above regularity lemma. 
We first recall some standard notation from probability theory. 

Definition 2.5 (Probability space). A probability space is a triple (J7, Bmax, P); 
where is a set (called the sample space) , ^Bmax is a cr-algebra^ of sets of O, 
(the elements of Bmax being the events), and P is a probability measure on 
-Bmax (thus it is non-negative and has total mass one). A random variable 

(T-algebra is a collection B of sets in the probability space D, which is closed under 
(countable) unions, intersections, and complements, and contains the empty set and fl. 
In the our applications B will typically be finite, in which case it can be identified with a 
finite partition Q = fli U . . . U Qai of the underlying probability space. Indeed, the cells 
of this partition are the atoms (minimal non-empty elements) of B, while B itself consists 
of all the sets which are unions of zero or more atoms in the partition. 
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is any measurable map X : Q ^ K to some space K (which will typically 
either be a finite set, or the real line). We let L^{Bmax) denote the space 
of real-valued, absolutely integrable random variables; as is customary we 
identify two random variables if they agree outside of an event of zero prob- 
ability. If X G L^{Braax), wc let E(X) denote the expectation of X. In 
particular, if E is an event, then E(1e) = P{E). 

Remark 2.6. For application to the regularity lemma, $7 will be a finite set, 
and yBinax will be the algebra of all subsets of $7, so there will be no issues 
as to whether a random variable is measurable or integrable. However, it 
is interesting to note that the arguments we give below extend with no 
difficulty whatsoever to the case of infinite probability spaces. 

Example 2.7. Our primary application will be to bipartite graphs, say 
between two vertex classes Vi and V2. In this case we can take Q = Vi x V2, 
Smax to be the power set of $7 (thus all subsets of Q are measurable events) , 
and P to be the uniform probability measure on Q; this corresponds to the 
operation of sampling two vertices xi and X2 uniformly and independently at 
random from Vi and V2 respectively. In this case, all functions X : V1XV2 ^ 
R are measurable, and the expectation is just the average value on Fi x V2. 

A crucial concept from probability theory is that of conditional expecta- 
tion. 

Definition 2.8 (Conditional expectation). Let {Q, Smax) P) be a probability 
space, and let 6 be a sub-a-algcbra of f?max- If wc let L'^{B) be the Hilbcrt 
space of i3-measurable, square-integrable real-valued random variables, with 
the usual norm ||X||j;,2(g) := E(|Xp)^/^, then L'^{B) is a closed subspace of 
-^^(-Smax), and we let X ^ E(X|5) be the associated orthogonal projection 
map from L^(^max) to L'^{B); thus for any square-integrable random variable 
X G L^(;Bmax), E(X|;S) will be a square-integrable S-measurable random 
variable. 

The conditional expectation can be defined explicitly in the case when B 
is finite, which is in fact the only case we will need in this paper. In such 
a case, the cr-algebra B is generated by a finite number of disjoint events 
Ai,. . . ,An of positive probability, possibly together with some additional 
events of zero probability which we can safely ignore. If X G L^(;Bmax)) the 
conditional expectation F,{X\B) will be equal (almost surely) to E(X|Aj) := 
pp-yE(Xlyi.) on each event A^. 

Next, we define the complexity of a cr-algebra, which is a simplified version 
of the Shannon entropy. 

Definition 2.9 (Complexity). Let B he a finite cr-algcbra in a probability 
space (Jl, 6inax, P)- Then the com,plexity complex(,B) of B is defined as the 
least number of events needed to generate ,B as a cr-algebra. 

Informally, a finite a-algebra of complexity M can be described using M 
bits of information (equivalently, it contains at most 2^ atoms) . 
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If B, B' are two sub-cr-algebras of ;Bmax5 we let B\/ B' denote the smallest 
cr-algebra which contains both B and B' . Note that if B and B' are finite, 
then By B' is also finite, with the sub-additivity property 



Example 2.10. We continue the running example in Example 12.71 Any 
partition Vi = Vi^i U . . . U Vi^m of the first vertex class induces a partition 
Vi X V2 = X V2) U . . . U {Vi^M X V2) of the probability space and 

hence creates a sub-u-algebra Bi of Smaxi which in information-theoretic 
terms captures the information of which cell of the partition the first ver- 
tex xi belongs to. The complexity of Bi is essentially log2 M. If we have 
another partition V2 = Y2,\ U . . . U V2,m of the second vertex class we can 
form another a-algebra 02, and thence create the joint a- algebra B\ V 02, 
whose atoms are pairs V\^i x V2J and whose complexity is essentially 2 log2 M 
(assuming for sake of discussion that all the cells in the partitions are non- 
empty). If X : V^i X V2 ^ R- is any random variable (which one can think 
of as a weight function assigning a number to each putative edge (xi,X2)), 
the conditional expectation E(X|;Bi V B2) is then the function which on 
each pair of cells V\^i x V2j' takes a value equal to the relative density 
|Vi_,||V2.,| SrcieVi,. ExaeVa,, ^(2^1' ^2) of X on this pair of cells. We remark 
that when X is the indicator function X = 1^; of a graph, the 1? norm of 
this conditional expectation (which we shall refer to here as the energy) is 
a familiar concept in the standard treatment of the regularity lemma and is 
usually referred to as the index of the partitions B\^B2- 

We now give a probabilistic Szemeredi regularity lemma, which we state 
in considerably more generality than we need to establish Theorem 12.31 

Theorem 2.11 (Szemeredi regularity lemma, probabilistic version). Let 

max)P) be a probability space, let {Bi^rnax)iei be a finite collection of 
sub-a -algebras of Bma.x, o,nd let X G L^(;Bmax) be a random variable with 
W-^Wh'^iBma.^) — ^- £ > be a number, let m > 0, and let F : R+ R+ 
be an arbitrary monotone increasing function. Then there exists finite sub- 
a-algebras Bi Q B[C- Bi^max for each i G I, and a non-negative real number^ 
M, obeying the following bounds: 

• (Size of M ) We have M > m and M = Oe,F,mi'^)- 

• (Complexity bound) We have complex (jBj) < M for all i ^ I. 

• ( Coarse and fine approximations are close ) We have 



It may be helpful to the reader to think of M as simply being the quantity 
maxig/(m, complex(Si)). Thus the upper bound on M translates to an upper bound 
on the complexity of the coarse partitions Bi, while the estimate @ asserts, roughly 
speaking, that the accuracy of the fine partitions exceeds the complexity of the coarse 
partitions (and also exceeds any specified constant m) by an arbitrary growth function F. 



complex(S V B') < complex(S) + complex(i3'). 




< e. 
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(Fine approximation is extremely accurate) For any collection (Aj)jg/ 
of events with Ai G Bi^max for all i & I, we have 



(3) 



E (X-E(X|\/SD)ni^» 



1 

< 



F{M) 



Remark 2.12. In the application to Theorem 12.31 we wih only need this 
theorem in the special case when X = 1^; is an indicator function, when 
/ = {1,2}, when Bi^max, B2,ma.x are finite and independent, with each 
atom having equal probability, F is essentially the exponential function, 
and Sjnax = 'Si^max V S2,max- However the more general version above is no 
harder to prove than this special case. One can also generalize to the case 
when X = {Xi, . . . , X„) is vector- valued, taking values in R"'; on the graph 
level, this would correspond to regularizing n graphs simultaneously using 
a single partitioning of the vertex classes. This vector- valued generalization 
is useful for iteration purposes, in order to easily obtain the corresponding 
hypergraph regularity lemma; this generalization is implicit in |21j . 



Remark 2.13. Informally, this theorem starts with a square-integrable ran- 
dom variable X, and some reference c-algebras -Bj^max- It then creates two 
approximations to X, namely a coarse approximation E(X| \/^^jBi) and a 
fine approximation Ei{X\\/ -^j B'j). The coarse approximation depends on 
only AI "bits" of information from each of the Bi^^aax, where M is a quantity 
for which we have some bounds. The fine approximation is rather close to 
the coarse approximation in L^(i3niax) norm. Finally, the fine approximation 
is extremely accurate, in the sense that adding an additional bit of infor- 
mation from each of the Bi^^^ax can only create an additional correlation of 
at most 1/F{M), where F{M) is a function of M which can be specified 
in advance to be as rapidly growing as one pleases. (Of course, there is 
a price to pay in selecting a function F which grows too rapidly, which is 
that the upper bound on M will deteriorate.) Somewhat remarkably, no 
independence or dependence assumptions between X and the Bi^^aax need 
to be made in order for this theorem to be applicable. 

We will prove Theorem 12.111 in the next section. For the remainder of 
this section, we show how Theorem 12 . 1 1 1 implies Theorem 12.31 

Proof of Theorem V2.'^ assuming Theorem A2.11\ Let G = (Vi, V2, E) be a bi- 
partite graph, thus E can be viewed as a subset of 14 x V2. We then define 
a probability space by setting the sample space := Vi V2-, setting the 
(T-algebra Smax = 2^ be the space of all subsets of fi, and setting P be the 
uniform probability measure on Q.. In particular, E is now an event in -Bmax- 
As mentioned in the introduction, this probability space corresponds to the 
space generated by selecting vertices xi,X2 from V\,V2 independently and 
uniformly. We then set / := {1,2}, and set ;Bi,max := {^1 x I/2 : ^1 C Vi} 
and i32,max := {Vi xyl2 : A2 C V2}, thus ;Bi,max and ^2,max are the cr-algebras 
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generated by the random variables xi and X2 respectively. We set X := Ig; 
clearly ||X||i2(e^^^) < 1. 

We now apply Theorem I2.1H with the growth function F : R"*" R"*" 
to be chosen later, and e replaced by e^/^. This gives us some a-algebras 
Si C B[ C ;Bi.niax and B2 Q B2 'S2,max and a non-negative quantity 
M = OF,e(l) such that 



(4) complex(Si), complex(S2) < M 

^ {^max) 



(5) ||E(1e|S1vS^)-E(1s|Si VS2)||i2(0 <e3/2 



(6) |E ((li, - E{1e\B[ V B'2))1a,xA,) I < 

for all Ai C VuA2 C V2. 

Now let J be a large integer to be chosen later; we will eventually show 
J = Oe(l). By hypothesis we may take |Vi|, IV2I > J- For each i € {1,2}, 
the finite cr-algebras Bi consists of at most 2^ atoms, thanks to (|1J). Then we 
can subdivide each of these atoms arbitrarily into sets of size L(i+ote))7-l ' 
plus an error of size 0{\Vi\/J). Combining all of the errors into a single 
exceptional set V^^o, we obtain a partition 

Vi = V-,0 U Fi,! U . . . Vij, 

where the sets l^,i, • • • , all have the same cardinality (comparable to 
|Vi|/J), and each lies in an atom of Bi, and the exceptional set Vi^ obeys 
the bounds 

\V,,o\ = 0{e\Vi\) + 0{2^'\V,\/J). 

Thus, if we choose J to be the nearest integer to 2*^/e, we obtain |Vi^o| = 
0(e|yi|) as desired. Also we observe that since M = Of,e{^)i we have 

J = OF,e(l). 

Now consider an induced bipartite graph Gj^^j^ '■— (^i,iu ^,i2) -^'"'(^i.ji ^ 
^2,^2)) where 1 < Ji,j2 < J- Suppose we wish to show that Gj^j;, is e- 
regular, thus 



EniAix A2)\ = l^njFi,, x y2,gi j^^ ^ ^ o{e\Vi,,\\V2,,, 



n 1 



J2l 



whenever Ai C Vi and A2 C V2,j2- triangle inequality (and by 

specializing the estimate below to the case Ai = Vij^-, A2 = V2J2), it suffices 
to find a quantity d which is independent of Ai, A2 (but which depends on 
Vijj, V2J2) such that 

\Er\{Ai X ^2)1 =d|^i X ^2| +0(e|^i,iill^2,i2l) 

whenever Ai C Vi^j^ and A2 ^ ^2j2- Dividing by |Vi||V2|, we can rewrite 
this as 

E((lB-d)U,xA2) = 0(e/j2). 
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Observe that Ai x A2 is contained in a single atom of BiV 82- Thus we may 
take d := E(l£;|;Bi V B2) on this atom. Our task is thus to establish 

E((1e - E(1e|^i V B2))Ia^xA,) = 0(e/J'). 

From (jHj) we have 

B{{1e - B{Ie\B[ V B'2))lA^xA2) = 0{1/F{M)) 

and so if we choose F{M) := 2^^'^ je^ then we have 

E((ls - E{\e\B\ V B'2))\a,xA,) = 0{e^/2^^) = 0{e/J^). 

Note that we now have J = OF,e(l) = Oe(l) as desired. Thus, in order 
to establish e-regularity of Gj-^j^, it suffices by the triangle inequality to 
establish that 

BmiE\B[\/ B'2) -B{lE\B,y B2)\lv,,,>cv,j = 0(e/j2). 

Note that E{lvi j_^xV2 — 0{1/J^). Thus by Cauchy-Schwarz, it would 
thus suffice to show that 

(7) EmiE\B[ V B'2) - E{1e\Bi V fi2)|'lyi„, xv,,,) = 0{e^/J^). 

On the other hand, from (jSJ we have 

B{\E{1e\B[ V B'2) - B{1e\Bi V B2)\^) = O(e'). 

Thus there are at most 0{eJ'^) pairs (ji,j2) for which (0) fails. Thus we 
have e-regularity for all but at most 0{eJ'^) pairs, as desired. □ 

Remark 2.14. It is clear from the argument that we can enforce a lower 
bound on the number J of partitions, simply by setting the parameter m 
equal to a large number rather than equal to zero, since this will give a lower 
bound for M and hence for J. Of course, this will also increase the lower 
bound required for |Vi|, IV2I, although in applications the cases when or 
IV2I are small tend to be fairly easy (and the regularity lemma is of little 
use in such situations anyway). Also, by considering multiple vertex sets 
(Vi)ig/ instead of just two, one can prove a version of hypergraph regularity 
lemma (similar to the early hypergraph lemma in ||3j ) by a similar argument 
to the one given above; we omit the details. However to obtain the stronger 
and more modern versions of the hypergraph regularity lemma one needs to 
apply results such as the one above repeatedly; see |5J for more details. 

3. Proof of Theorem 12.111 

We now give the proof of Theorem l2 . 1 1 1 Let us fix (J7, iSmax, P), ('Bi,max)je/) 
X, e, m, F. A crucial concept in the proof (as in the standard proof of the 
regularity lemma) will be that of the energy (or index) of a cr-algebra (or 
partition) . This energy has a particularly simple description in the language 
of conditional expectation: 



12 



TERENCE TAG 



Definition 3.1. For any cr-algebra B C Bma.^, we define the energy £{B) of 
B to be tlie quantity 



£{B) := ||E(X|S)||i.(^^^ 



0- 



Informally, £{B) measures how close the subspace L'^{B) of the Hilbert space 
max) gets to containing the vector X. 

Remark 3.2. In the running example of Example l2.1()[ with X the indicator 
function of a graph and B = BiM B2-, the energy corresponds to the index 
of the partitions associated to Bi,B2, as used for instance in |19j . 

From the hypothesis ||X||^2(g^^^^) < 1, and the fact that X i-^ E(X|i3) is 
an orthonormal projection we observe the estimate 

(8) < £{B) < 1. 

Also, if S C B', then a simple application of Pythagoras's theorem yields 

(9) £{B') = £{B) + mx\B') - E(X|S)||i.(g_). 

In particular, finer cr-algebras have higher energy. 

We shall prove the regularity lemma via an energy incrementation ar- 
gument. We shall take some cr-algebras Bi,B[ and see if they verify the 
required properties of the lemma. If they do not, we will be able to replace 
some of these cr-algebras by finer cr-algebras with slightly higher complexity 
and somewhat larger energy. The bounds © will be used to show that 
this energy incrementation cannot continue indefinitely, and when it does 
stop, we will establish the theorem. 

The key step in the argument is the following. 

Lemma 3.3 (Lack of regularity implies energy increment). Suppose we have 
finite a-algebras B[ C Sj^max o,nd events Ai G Bi,ma.x for each i & I such that 



Bi{X-E{X\\/B'^)lllA^ 

\ iei iei 



1 

> 



F{M) 



for some M > 0. Then if we set 

B'l := B[ V {0, Ai,Q\Ai, n} for all i e I 

(thus B'l is the a -algebra generated by B[ and Ai), then we have the com- 
plexity increment 

(10) complex(^f ) < complex(i30 + 1 for all i e I 

and the energy increment 

(11) ^(V^n>^(V^o + ^. 
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Proof. The complexity increment is immediate from the definition of com- 
plexity. As for the energy increment, observe that Hie/ measurable 
in Vie/ ^i' Thus we have 

E ({X - E(xi v SO) n = ^ ( V - V ^^)) n • 

V i&I iei ) V iei iei iei J 

On the other hand, we clearly have E((]^.gj' IaJ^) < 1- Applying Cauchy- 
Schwarz, we conclude 



^{{X-nx\\lB',))\{lA^ 

\ iei iei 



< 



nx\\iB'i)-nx\\iB'-, 

iei iei 



By hypothesis, we thus have 

B{X\\/B'/)-E{X\\/B':^ 



> 



F{My 



□ 



iei iei 
The claim now follows from ®. 

We can now quickly prove Theorem 12.111 We shall run the following 
double- loop algorithm to generate Bi, B[, and M. 

• Step 0: Initialize Bi = B'^^ = {0, 0} to be the trivial a-algebra for 
each i € I. 

• Step 1: Set M to be the quantity 



M := max ( m, maxcomplex(;Bj 
\ iei 

Thus, for instance, the initial value of M will be m. 

• Step 2: If (131) holds, then we halt the algorithm. Otherwise, we can 
apply Lemma VA.'Al to locate u-algebras B'^ C B'^ C Bi^rnax for i € / 
obeying ((TUl) and (fTT|) . 

• Step 3: If we have 

£C\jB'^)<£{\jB.)+e' 

iei iei 

then we set B'^ equal to B'-' for each i G I, and return to Step 2. 
Otherwise, we set Bi and B[ both equal to B'-' for each i G I, and 
return to Step 1. 

The following observations about the above algorithm are easily verified 
by induction: 

• At every stage of the algorithm, we have Bi B'^ CI Bi^raax for all 
iei. 

• At every stage of the algorithm, we have 

£C\jB',)<£{\jB.)+e' 
iei iei 
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and hence by Q we have 0- 
• At every stage of the algorithm we have m < M and complex(;Bi) < 
M for all i G /. 

Thus, if the algorithm does halt (so that (jSJ holds), then we will have 
achieved every objective of Theorem 12.111 except possibly for the upper 
bound M = Op^s^Ei^) on M. Hence the only remaining task is to show that 
the algorithm does indeed halt in finite time with the required bound on M. 

Let us first analyze the inner loop of the algorithm, which loops between 
Step 2 and Step 3. At the start of this inner loop (i.e. when one enters Step 
2 from Step 1), the B'^ are equal to Bi. At each execution of this inner loop, 
the energy ^^jB[) increases by at least -ppjp, thanks to Hll() . while the 
complexities complex(S-) increase by at most 1, thanks to (jlUI) . On the 
other hand, if the energy £{\/-^jB'j) ever increases by more than e^, then 
we will end the inner loop and instead trigger the outer loop (returning 
from Step 3 to Step 1). Thus for any fixed iteration of the outer loop, the 
inner loop can run for at most F(M)^/e^ + 1 iterations, and the complexity 
of the c7-algebras increase by at most F(M)^/e^ + 1 when doing so. In 
particular, the inner loop always terminates in finite time. 

Now we can analyze the outer loop. At the beginning of this loop, the Bi 
are equal to the trivial algebra, and M is equal to m. After each iteration 
of this outer loop, each Bi is replaced by a cr-algebra B" whose complexity 
is at most F{M)'^ /e^ + 1 more than the complexity of Bi. In particular, the 
complexity of the new value oiBi is at most M + F(M)^/e^ + l, which causes 
the new value of M to be bounded by M + F{Mf/£^ + 1. Also, the energy 
£{\/-^j Bi) of Bi will increase by at least e^. From Q we thus see that the 
outer loop can execute at most [1/e^J. Thus the algorithm terminates in 
finite time, and the final value of M is bounded by the quantity obtained 
by applying [1/e^J iterations of the map M M + F(M)^/e^ + 1 to m, so 
in particular M = OF^s,mi^)- This completes the proof of Theorem 12.111 ■ 

Remark 3.4. The doubly- iterated nature of the argument, combined with 
the desire for the growth function F to be exponential for the application to 
Theorem 12.31 causes the final bounds on M (and hence on J) to be tower- 
exponential in for some absolute constant C. As discussed in jH], 
this tower exponential bound cannot be significantly improved. However, 
by lowering F to linear or polynomial growth one can obtain a somewhat 
weaker regularity lemma, but with better bounds; see pTO; for some further 
discussion on how one can adjust the strength of the regularity lemma to 
suit one's application. In the converse direction, we will need to increase 
F further, to tower-exponential or even faster, when we iterate this lemma 
to obtain hypergraph regularity lemmas^. The flexibility afforded by this 



Basically, to obtain a satisfactory regularity control on hypergraphs, say 3-uniform 
hypergraphs, one has to first apply a result such as Theorem 12.111 with some growth 
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additional parameter which is not present in the usual formulation of the 
regularity lemma, may hopefully be useful for other applications also. 

4. An entropy variant of the regularity lemma 

One can also give a variant of the above arguments, in which the norm 
is replaced by the Shannon entropy. In particular, the energy incrementa- 
tion argument is replaced by an entropy incrementation argument, which 
gives the lemma a much more information-theoretic flavour than before. As 
always we fix an ambient probability space (0, Smaxj P)- 

Definition 4.1 (Entropy). If B C B 

max is a finite cr-algebra, we define the 
Shannon entropy il{B) to be the quantity 

H(S) ■.= E^iA)log,^ 

A ^ ' 

where A ranges over all the atoms of B and we adopt the convention log ^ = 
0. If X is a random variable taking only finitely many values, we define 
H(X) := il{Bx), where Bx is the cr-algebra generated by X. In other 
words 

H(X):=X:P(.Y=.)l„g,j;^. 

It is easy to verify that if X is a Boolean variable (only taking the values 
and 1), then H(X) can be at most 1. More generally, we have the inequality 

U{B) < complex(i3) 

for any finite fi-algebra B. The quantity H(X) measures, roughly speaking, 
how much information one could learn from X. It can be viewed as a more 
refined version of the complexity, which is less sensitive to exceptional events 
of small probability than the complexity is. 

In the probabilistic formulation of the regularity lemma, conditional ex- 
pectation played a prominent role. In the entropy formulation, the analogous 
concept is conditional entropy. 

Definition 4.2 (Conditional entropy). If X,Y are random variables tak- 
ing finitely many values, we define the conditional entropy H(X|y) by the 



function pf°-^* to approximate some 3-uniform object by a collection of 2-uniform a- 
algebras (i.e. partitions of complete graphs into incomplete graphs). One then applies 
Theorem EH] again with another growth function F to approximate the atoms of those 
2-uniform cr-algebras by some 1-uniform objects (vertex partitions) . In order for the error 
terms to be manageable, it turns out that F^'^"^ has to grow much faster than F, in fact 
it must essentially be an iterated version of F. See |21| for further discussion. 
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formula 

U{X\Y) := 5^P(y = y)il{X\Y = y) 

y 

= Y.^{Y = y)Y, V{X = x\Y = y) log, ^^^^ = x\Y = y) ' 
An equivalent definition is given by the Bayes identity 

H(xiy) = H(x,y) - H(y). 

The quantity H(X|y) measures, roughly speaking, how much new informa- 
tion one could still learn from X if one already knew the value of Y (thus 
for instance H(X|X) is always zero). 

Another key quantity we need is the conditional mutual information \{X : 
Y\Z) of three random variables X, Y, Z taking finitely many values, defined 
by 

1{X : Y\Z) := ii{X\Z) - li{X\Y, Z) = H(y|Z) - H(y |X, Z); 

informally, it measures how much knowing Y would tell one about X, or 
vice versa, assuming that Z is already known. A handy (and intuitive) fact 
is that the conditional mutual information is always non-negative; this is 
equivalent to the submodularity inequality 

H(X, y, Z) + H(Z) < H(X, Z) + H(y, Z) 

for entropy, and can be proven via Jensen's inequality. A more quantitative 
assertion of this fact is given in Lemma 14.41 below. 

If X and y are random variables, we write X Y, and say that Y is 
determined by X, if By C Bx- If X and Y take only finite values, then 
X I— > y is equivalent to the existence of a functional relationship Y = f{X) 
for some deterministic function /, and is also equivalent (up to events of 
probability zero) to the conditional entropy H(y|X) vanishing. 

We now give the information-theoretic analogue of Theorem 12.111 To 
simplify the notation a little bit we will restrict to the case / = {1,2}, 
although the generalization to more than two reference cj-algebras is not 
difficult. 

Lemma 4.3 (Information-theoretic regularity lemma). Let Xi,X2,Y be 
random variables taking finitely many values such that H(y) < m for some 
m > 0. Let F : be an arbitrary function, and e > 0. Then there 

exists random variables Zx^Z, ( the "coarse approximation") and Z[, Z, ( the 
'fine approximation"), also taking finitely many values, with the following 
properties. 

• (Determinism) We have the determinism relations 

(12) Xi^ Z[^ Zi; X2^ Z'2^ Z2. 

• ( Coarse approximation has bounded entropy ) We have 

(13) H(Zi, Z2) < U{Z[, Z'2) = OF,e,m(l). 
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• ( Coarse and fine approximations are close ) We have 
(14) 1{Y ■.Z[,Z'2\Zi,Z2)<e. 

• (Fine approximation is nearly optimal) For any random variables 
Wi , W2 with Xi 1-^ Wi and X2 ^ W2 we have 

(1.) 4) 

Proof. To construct Zi, Z2, Z[, Z^ we perform the following "entropy incre- 
mentation" algorithm, which is closely analogous to the energy incrementa- 
tion algorithm used in the proof of Theorem 12.111 

• Step 0. Initialize Z\ = Z2 = ^ (one can of course replace by any 
other deterministic random variable). 

• Step 1. Let Z'^ be random variables which minimize the quantity 

subject to the constraints X\ Z[ Zi and X2 1-^ Z2 1— > Z2. (If 
there are several such minimizers, we select among them arbitrarily.) 

• Step 2. If we have 

U{Y\Z^,Z2)-U{Y\Z[,Z'2)>e 

then we replace Zi, Z2 with Z[, Z^ respectively, and return to Step 
1. Otherwise, we terminate the algorithm. 
We remark that because X\^X2 take only finitely many values, the num- 
ber of possibilities for the random variables Z^ is finite up to equivalence. 
Hence a minimizer to the quantity (|16j) always exists. Intuitively, Z'^^Z^ is 
constructed to capture as much information about Y as is possible while 
remaining determined by X\ , X2 ; the slight penalty term in (|16j) is designed 
to keep some control of the entropy of Z^ (otherwise it would be as large 
as that of Xi,X2, for which we have no bounds). Observe that every time 
we return from Step 2 to Step 1, the quantity ii{Y\Zi, Z2) (which measures 
the amount of information in Y that remains to be captured by Zi, Z2) de- 
creases by at least e. On the other hand, from Jensen's inequality one can 
verify that 

< H(y|Zi,Z2) < H(y) < m. 

Thus the above algorithm must halt after at most m/e iterations. It is also 
clear that the random variables Zi, Z2, Z[, Z'2 generated by this algorithm 
will obey the determinism relationships H12|l and 1)14(1 . 

Also, if Wi,W2 are any random variables determined by Xi,X2 respec- 
tively, then by comparing the minimizer Z[, Z'2 against the competitor (Z{, W\), 
{Z2, W2) (which obeys the required constraints), we have 

Il{Y\Z„Z2) + ^^^^^^^^^^^<U{Y\Z„Z2,WuW2)+ ^(H(Zi,Z2)) ' 
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Since H(ylZ(,Z0-H(y|Z;,Z^,H^i,VF2) =I(y : Wi,W2\Z[,Z!^)andIl{Z[,Z^,Wi,W2) < 
ll{Z[, Z'2) + U{Wi, W2), we obtain ^ as desired after some algebra. 

Now we compare the entropies of Z\^Z2 and Z'-^^Z^. Since Z\^Z2 obeys 
the constraints in the minimization problem (|16|) . we have 

As observed earlier, the first summand on either side ranges between and 
m. Thus we have (after some rearranging) 

H(Zl, Z'^) < U{Zi, Z2) + mF(H(Zi, Z2)). 

In particular, every time we return from Step 2 to Step 1, the quantity 
H(Zi,Z2) increases by at most mF(H(Zi, Z2)). From Step 0, the initial 
value of H(Zi, Z2) is 0. Since the number of iterations is bounded by m/e, 
we see that the final value of H(Zi, Z2) is bounded by a finite (but extremely 
large) quantity Om,F,e(l) or more explicitly the value obtained after m/e 
iterations of the map M 1-^ M + mF{M) applied to 0. □ 

To pass from an entropy formulation to an expectation formulation, we 
need a way to pass from control of entropy to control of expectations. A 
clue to how to do this is provided by the following observation: if y 1-^ y' 
and \{X : y|y') = 0, then X and Y are independent conditionally on Y' . In 
particular, if X takes values in a vector space, this implies that E(X|y) = 
E(X|y'). In other words, whenever 1{X : Y\Y') = H(X|y') - H(X|y) is 
zero, so is E(X|y') — E(Xjy). This may help motivate the following lemma, 
which is a perturbative version of the above observation. 

Lemma 4.4 (Relation between entropy and expectation). Let X,Y,Y' he 
discrete random variables with Y ^ Y' , and with X taking values in the 
unit interval { — 1 < x < 1}. Then we have 

E (|E(x|y') - E(x[y)|) < 2i{x : y|y')^/^. 

More informally, this lemma asserts that approximate conditional inde- 
pendence in the entropy sense implies approximate conditional independence 
in an expectation sense. The bound 2I{X : y|y')^/^ is not best possible, 
but any bound which decays to zero as 1{X : y|y') will be sufficient 
for our purposes. 

Proof. The basic idea is to exploit the observation that the function x log - 
is not only concave but also strictly concave on [0, 1] . Let us first verify 
the lemma in the special case when Y' is deterministic (so the hypothesis 
y I— > y is vacuous), thus we wish to prove 

E(|E(X) - E(X|y)|) < 2I{X : Yfl"^. 

Let 1 < Xi, . . . , x„ < —1 be the essential range of X ^ and let yi, . . . , be 
the essential range of Y. For any 1 < i < n and 1 < j < m, define the 
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probabilities 

Pij := P(X = Xi\Y = Uj) 
qj := P(y = yj) 

m 

Pi := ^ QjPij = = Xi) 
i=i 

Then we observe that < Pij,qj < 1 and that (7j = 1- If we define 

/ : [0, 1] ^ R to be the function f{x) := — xlogx (with the convention 
/(O) := 0), we thus have 

I{X : Y) = UiX) - U{X\Y) 

n m 
i=l 3=1 

Now observe that / is concave, indeed we have f"{x) = —1/x for all x G 
(0, 1]. Thus by Taylor's theorem with remainder, 

f{Pij) < f{Pi) + f'iPi){Pij -Pi)- \{Pij - Pif/Ptj 

where p*j is a quantity between pij and pi. Inserting this into the preceding 
estimate and noting that ^2^=1 QjiPij ~ Pl) = Oj we conclude that 

m n 

j=i i=i 

Now we compute using the boundedness of Xi and Cauchy-Schwarz, as well 
as the crude estimate p*j <Pi+ Pij , 



E(|E(X) - E{X\Y)\) = J2 9,|E(X) - E{X\Y = y^)\ 



m n 



j=l i=l 



m n 



3=1 i=l 

m n m n 

j=l 1=1 j=l 1=1 

m n 

<[2\{X ■.Y)Y,<^3Y.P^+P^J]"^ 

3=1 i=l 

= 2I{X : y)^/2_ 
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Now we consider the general case when Y' is not deterministic. In that case 
we write 

E(|E(X|y')-E(X|y)|) = ^P(y' = y'm\nX\Y' = y')-E{X\Y;Y' = y')\). 

y' 

(Here we have taken advantage of the hypothesis Y i— > Y' .) Applying the 
preceding computation, we conclude 

E(|E(x|y') - E(x|y)|) < ^P(y' = y')21{X Y\Y' = y'f/^. 

y' 

Applying Cauchy-Schwarz again we conclude 



E(|E(x|y') - E(x|y)|) < = y'^^^^ ■ ^1^' = 

= 2I{X : Y\Y')'^/^ 

as desired. □ 

By combining this with Lemma f4.c{l it is possible to give a statement closely 
resembling Theorem l'i.lll and which is also sufficient to imply Theorem 12. 31 
We omit the details. 
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